Search CORE

43 research outputs found

Keemei: cloud-based validation of tabular bioinformatics file formats in Google Sheets.

Author: Ackermann Gail
Bolyen Evan
Caporaso J Gregory
Chase John H
González Antonio
Knight Rob
Rideout Jai Ram
Publication venue: eScholarship, University of California
Publication date: 01/06/2016
Field of study

BackgroundBioinformatics software often requires human-generated tabular text files as input and has specific requirements for how those data are formatted. Users frequently manage these data in spreadsheet programs, which is convenient for researchers who are compiling the requisite information because the spreadsheet programs can easily be used on different platforms including laptops and tablets, and because they provide a familiar interface. It is increasingly common for many different researchers to be involved in compiling these data, including study coordinators, clinicians, lab technicians and bioinformaticians. As a result, many research groups are shifting toward using cloud-based spreadsheet programs, such as Google Sheets, which support the concurrent editing of a single spreadsheet by different users working on different platforms. Most of the researchers who enter data are not familiar with the formatting requirements of the bioinformatics programs that will be used, so validating and correcting file formats is often a bottleneck prior to beginning bioinformatics analysis.Main textWe present Keemei, a Google Sheets Add-on, for validating tabular files used in bioinformatics analyses. Keemei is available free of charge from Google's Chrome Web Store. Keemei can be installed and run on any web browser supported by Google Sheets. Keemei currently supports the validation of two widely used tabular bioinformatics formats, the Quantitative Insights into Microbial Ecology (QIIME) sample metadata mapping file format and the Spatially Referenced Genetic Data (SRGD) format, but is designed to easily support the addition of others.ConclusionsKeemei will save researchers time and frustration by providing a convenient interface for tabular bioinformatics file format validation. By allowing everyone involved with data entry for a project to easily validate their data, it will reduce the validation and formatting bottlenecks that are commonly encountered when human-generated data files are first used with a bioinformatics system. Simplifying the validation of essential tabular data files, such as sample metadata, will reduce common errors and thereby improve the quality and reliability of research outcomes

PubMed Central

eScholarship - University of California

Ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses.

Author: Bolyen Evan
Caporaso J Gregory
Chase John
Fouquier Jennifer
Kelley Scott T
Knight Rob
McDonald Daniel
Rideout Jai Ram
Shiffer Arron
Publication venue: eScholarship, University of California
Publication date: 01/01/2016
Field of study

BackgroundFungi play critical roles in many ecosystems, cause serious diseases in plants and animals, and pose significant threats to human health and structural integrity problems in built environments. While most fungal diversity remains unknown, the development of PCR primers for the internal transcribed spacer (ITS) combined with next-generation sequencing has substantially improved our ability to profile fungal microbial diversity. Although the high sequence variability in the ITS region facilitates more accurate species identification, it also makes multiple sequence alignment and phylogenetic analysis unreliable across evolutionarily distant fungi because the sequences are hard to align accurately. To address this issue, we created ghost-tree, a bioinformatics tool that integrates sequence data from two genetic markers into a single phylogenetic tree that can be used for diversity analyses. Our approach starts with a "foundation" phylogeny based on one genetic marker whose sequences can be aligned across organisms spanning divergent taxonomic groups (e.g., fungal families). Then, "extension" phylogenies are built for more closely related organisms (e.g., fungal species or strains) using a second more rapidly evolving genetic marker. These smaller phylogenies are then grafted onto the foundation tree by mapping taxonomic names such that each corresponding foundation-tree tip would branch into its new "extension tree" child.ResultsWe applied ghost-tree to graft fungal extension phylogenies derived from ITS sequences onto a foundation phylogeny derived from fungal 18S sequences. Our analysis of simulated and real fungal ITS data sets found that phylogenetic distances between fungal communities computed using ghost-tree phylogenies explained significantly more variance than non-phylogenetic distances. The phylogenetic metrics also improved our ability to distinguish small differences (effect sizes) between microbial communities, though results were similar to non-phylogenetic methods for larger effect sizes.ConclusionsThe Silva/UNITE-based ghost tree presented here can be easily integrated into existing fungal analysis pipelines to enhance the resolution of fungal community differences and improve understanding of these communities in built environments. The ghost-tree software package can also be used to develop phylogenetic trees for other marker gene sets that afford different taxonomic resolution, or for bridging genome trees with amplicon trees.Availabilityghost-tree is pip-installable. All source code, documentation, and test code are available under the BSD license at https://github.com/JTFouquier/ghost-tree

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

American Gut: an Open Platform for Citizen Science Microbiome Research

Author: Abramson Max
Ackermann Gail
Ajami Nadim
Aksenov Alexander A.
Amir Amnon
Ball Madeleine
Behsaz Bahar
Berg-Lyons Donna
Bittinger Kyle
Blaser Martin
Bloss Cinnamon
Bobe Jason
Bolyen Evan
Brennan Caitriona
Bushman Frederic D.
Butyaev Alexander
Callewaert Chris
Cani Patrice D.
Caporaso J. Gregory
Chase John H.
Chen Yingfeng
Chia Nicholas
Chopra Deepak
Church George M.
Clemente Jose C.
Debelius Justine W.
DeRight Goldasich Lindsay
Dillon Lindsay
Dominguez-Bello Maria G.
Dorrestein Pieter C.
Drogaris Chris
Dunn Robert R.
Dutton Rachel
Eisen Jonathan A.
Eshoo-Anton Tifani W.
Fahimipour Ashkaan K.
Fang Xin
Fierer Noah
Frazier Angel
Gaffney James
Gajer Pawel
Galzerani Daniela Domingos
Gauglitz Julia
Geier Justin
German J. Bruce
Gevers Dirk
Gilbert Jack A.
Gogul Grant
Gonzalez Antonio
Gonzalez David J.
Goodrich Julia
Gottel Neil R.
Green Jessica L.
Gunderson Beau
Hampton-Marcell Jarrad T.
Heale Arthur Cole
Holmes Susan
Holscher Hanna D.
Hugenholtz Philip
Humphrey Greg
Huttenhower Curtis
Hyde Embriette
Jackson Matthew A.
Jacobs Julian
Jankowska Marta M.
Janssen Stefan
Jansson Janet K.
Jarmusch Alan
Jeste Dilip V.
Jiang Lingjing
Kashyap Purna
Kelley Scott T.
Kerr Jacqueline
Klemmer Scott
Knight Rob
Knights Dan
Kopylova Evguenia
Kosciolek Tomasz
Krajmalnik-Brown Rosa
Kurisu Mike
Ladau Joshua
Lauber Christian L.
Leach Jeff
Lebrilla Carlito
Lewis Cecil M.
Lewis James D.
Lewis Kim
Ley Ruth
Lovelace Elijah
Lowry Christopher A.
Lozupone Catherine
Mann Allison E.
Marotz Clarisse
Martino Cameron
Matamoros Sébastien
Mayer Kris
McDonald Daniel
Meleshko Dmitry
Melnik Alexey V.
Metcalf Jessica L.
Mills David A.
Mills Robert H.
Minson Michael
Mirarab Siavash
Mohimani Hosein
Monk Jonathan
Montassier Emmanuel
Moorman Stephanie
Morton James T.
Navas-Molina Jose
Nazarova Elena
Nguyen Dominic
Nguyen Tanya T.
Owens Sarah M.
Pandey Vineet
Park Rachel S.
Peddada Shyamal
Petrosino Joseph
Pevzner Pavel
Pierce Emily
Pirrung Meg
Pollard Katherine S.
Raes Jeroen
Rahnavard Gholamali
Raison Charles
Ravel Jacques
Reeve Nicolai
Robbins-Pianka Adam
Salido Benitez Rodolfo Antonio
Sanders Karenina
Sangwan Naseer
Saxe Gordon
Schriml Lynn M.
Schwartz Tara
Sears Dorothy D.
Shorenstein Joshua
Silva Ricardo
Smarr Larry
Song Se Jin
Spector Timothy
Strandwitz Philip
Suchodolski Jan S.
Swafford Austin D.
Swanson Kelly S.
TerAvest Emily
Thackray Varykina G.
Thompson Luke R.
Treuren Will Van
Tripathi Anupriya
Ugrina Ivo
Vigers Tim
Vrbanac Alison
Vázquez-Baeza Yoshiki
Waldispühl Jérôme
Warinner Christina
Wendel Doug
White Owen
Willner Dana
Wischmeyer Paul
Wolfe Elaine
Wozniak Jacob M.
Wu Gary D.
Xavier Ramnik J.
Zaramela Livia S.
Zech Xu Zhenjiang
Zengler Karsten
Zhang Chi
Zhu Qiyun
Publication venue: 'American Society for Microbiology'
Publication date: 01/01/2018
Field of study

McDonald D, Hyde E, Debelius JW, et al. American Gut: an Open Platform for Citizen Science Microbiome Research. mSystems. 2018;3(3):e00031-18

Publications at Bielefeld University

q2-FMT

Author: Chloe Herman
Evan Bolyen
J. Gregory Caporaso
Liz Gehret
Publication venue
Publication date: 20/09/2023
Field of study

q2-FMT is a software package with a suite of tools to enable microbiome researchers to quantify engraftment extent following Fecal Microbiota Transplant

ZENODO

q2-sample-classifier: machine-learning tools for microbiome classification and regression

Author: Bokulich Nicholas
Bolyen Evan
Caporaso J. Gregory
Dillon Matthew R.
Huttley Gavin A.
Kaehler Benjamin D.
Publication venue: s.n.
Publication date: 01/10/2018
Field of study

ISSN:2475-906

Repository for Publications and Research Data

Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2's q2-feature-classifier plugin

Author: Bokulich Nicholas A
Bolyen Evan
Caporaso James
Dillon Matthew
Huttley Gavin Austin
Kaehler Benjamin
Knight Rob
Rideout Jai Ram
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/03/2019
Field of study

Background: Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. Results: We present q2-feature-classifier (https://github.com/qiime2/q2-feature-classifier), a QIIME 2 plugin containing several novel machine-learning and alignment-based methods for taxonomy classification. We evaluated and optimized several commonly used classification methods implemented in QIIME 1 (RDP, BLAST, UCLUST, and SortMeRNA) and several new methods implemented in QIIME 2 (a scikit-learn naive Bayes machine-learning classifier, and alignment-based taxonomy consensus methods based on VSEARCH, and BLAST+) for classification of bacterial 16S rRNA and fungal ITS marker-gene amplicon sequence data. The naive-Bayes, BLAST+-based, and VSEARCH-based classifiers implemented in QIIME 2 meet or exceed the species-level accuracy of other commonly used methods designed for classification of marker gene sequences that were evaluated in this work. These evaluations, based on 19 mock communities and error-free sequence simulations, including classification of simulated "novel" marker-gene sequences, are available in our extensible benchmarking framework, tax-credit (https://github.com/caporaso-lab/tax-credit-data). Conclusions: Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make recommendations regarding parameter choices for these classifiers under a range of standard operating conditions. q2-feature-classifier and tax-credit are both free, open-source, BSD-licensed packages available on GitHub

The Australian National University

Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin

Author: Bokulich Nicholas
Bolyen Evan
Caporaso J. Gregory
Dillon Matthew
Huttley Gavin A.
Kaehler Benjamin D.
Knight Rob
Rideout Jai Ram
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Background Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. Results We present q2-feature-classifier (https://github.com/qiime2/q2-feature-classifier), a QIIME 2 plugin containing several novel machine-learning and alignment-based methods for taxonomy classification. We evaluated and optimized several commonly used classification methods implemented in QIIME 1 (RDP, BLAST, UCLUST, and SortMeRNA) and several new methods implemented in QIIME 2 (a scikit-learn naive Bayes machine-learning classifier, and alignment-based taxonomy consensus methods based on VSEARCH, and BLAST+) for classification of bacterial 16S rRNA and fungal ITS marker-gene amplicon sequence data. The naive-Bayes, BLAST+-based, and VSEARCH-based classifiers implemented in QIIME 2 meet or exceed the species-level accuracy of other commonly used methods designed for classification of marker gene sequences that were evaluated in this work. These evaluations, based on 19 mock communities and error-free sequence simulations, including classification of simulated “novel” marker-gene sequences, are available in our extensible benchmarking framework, tax-credit (https://github.com/caporaso-lab/tax-credit-data). Conclusions Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make recommendations regarding parameter choices for these classifiers under a range of standard operating conditions. q2-feature-classifier and tax-credit are both free, open-source, BSD-licensed packages available on GitHub

Repository for Publications and Research Data

Facilitating bioinformatics reproducibility with QIIME 2 Provenance Replay.

Author: Chloe Herman
Christopher R Keefe
Colin V Wood
Elizabeth Gehret
Evan Bolyen
J Gregory Caporaso
Mary Jewell
Matthew R Dillon
Publication venue: Public Library of Science (PLoS)
Publication date: 01/11/2023
Field of study

Study reproducibility is essential to corroborate, build on, and learn from the results of scientific research but is notoriously challenging in bioinformatics, which often involves large data sets and complex analytic workflows involving many different tools. Additionally, many biologists are not trained in how to effectively record their bioinformatics analysis steps to ensure reproducibility, so critical information is often missing. Software tools used in bioinformatics can automate provenance tracking of the results they generate, removing most barriers to bioinformatics reproducibility. Here we present an implementation of that idea, Provenance Replay, a tool for generating new executable code from results generated with the QIIME 2 bioinformatics platform, and discuss considerations for bioinformatics developers who wish to implement similar functionality in their software

Directory of Open Access Journals

q2-longitudinal: Longitudinal and Paired-Sample Analyses of Microbiome Data

Author: Evan Bolyen
Huilin Li
J. Gregory Caporaso
Jai Ram Rideout
Matthew R. Dillon
Nicholas A. Bokulich
Paul S. Albert
Yilong Zhang
Publication venue: 'American Society for Microbiology'
Publication date: 01/01/2018
Field of study

Longitudinal sampling provides valuable information about temporal trends and subject/population heterogeneity. We describe q2-longitudinal, a software plugin for longitudinal analysis of microbiome data sets in QIIME 2. The availability of longitudinal statistics and visualizations in the QIIME 2 framework will make the analysis of longitudinal data more accessible to microbiome researchers.Studies of host-associated and environmental microbiomes often incorporate longitudinal sampling or paired samples in their experimental design. Longitudinal sampling provides valuable information about temporal trends and subject/population heterogeneity, offering advantages over cross-sectional and pre-post study designs. To support the needs of microbiome researchers performing longitudinal studies, we developed q2-longitudinal, a software plugin for the QIIME 2 microbiome analysis platform (https://qiime2.org). The q2-longitudinal plugin incorporates multiple methods for analysis of longitudinal and paired-sample data, including interactive plotting, linear mixed-effects models, paired differences and distances, microbial interdependence testing, first differencing, longitudinal feature selection, and volatility analyses. The q2-longitudinal package (https://github.com/qiime2/q2-longitudinal) is open-source software released under a 3-clause Berkeley Software Distribution (BSD) license and is freely available, including for commercial use

Repository for Publications and Research Data

Directory of Open Access Journals

Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2's q2-feature-classifier plugin.

Author: Bokulich Nicholas A
Bolyen Evan
Dillon Matthew
Gregory Caporaso J
Huttley Gavin A
Kaehler Benjamin D
Knight Rob
Rideout Jai Ram
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

BACKGROUND:Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. RESULTS:We present q2-feature-classifier ( https://github.com/qiime2/q2-feature-classifier ), a QIIME 2 plugin containing several novel machine-learning and alignment-based methods for taxonomy classification. We evaluated and optimized several commonly used classification methods implemented in QIIME 1 (RDP, BLAST, UCLUST, and SortMeRNA) and several new methods implemented in QIIME 2 (a scikit-learn naive Bayes machine-learning classifier, and alignment-based taxonomy consensus methods based on VSEARCH, and BLAST+) for classification of bacterial 16S rRNA and fungal ITS marker-gene amplicon sequence data. The naive-Bayes, BLAST+-based, and VSEARCH-based classifiers implemented in QIIME 2 meet or exceed the species-level accuracy of other commonly used methods designed for classification of marker gene sequences that were evaluated in this work. These evaluations, based on 19 mock communities and error-free sequence simulations, including classification of simulated "novel" marker-gene sequences, are available in our extensible benchmarking framework, tax-credit ( https://github.com/caporaso-lab/tax-credit-data ). CONCLUSIONS:Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make recommendations regarding parameter choices for these classifiers under a range of standard operating conditions. q2-feature-classifier and tax-credit are both free, open-source, BSD-licensed packages available on GitHub

Repository for Publications and Research Data

Directory of Open Access Journals

eScholarship - University of California